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<N , Abstract 

We discuss why MMSE estimation arises in lattice-based schemes for approach- 
ing the capacity of linear Gaussian channels, and comment on its properties. 
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^ ■ 1 Introduction 

Recently, Erez and Zamir [HI E2] have cracked the long-standing problem of achieving 
the capacity of additive white Gaussian noise (AWGN) channels using lattice codes and 
lattice decoding. Their method uses Voronoi codes (nested lattice codes), dither, and 
an MMSE estimation factor a that had previously been introduced in more complex 
multiterminal scenarios, such as Costa's "dirty-paper channel" However, they give no 
fundamental explanation for why an MMSE estimator, which is seemingly an artifact from 
the world of analog communications, plays such a key role in the digital communications 
^ ■ problem of achieving channel capacity. 

The principal purpose of this paper is to provide such an explanation, in the lattice- 
based context of a mod-A AWGN channel model. We discuss various properties of 
MMSE-based schemes in this application, some of which are unexpected. 

MMSE estimators also appear as part of capacity-achieving solutions for more general 
linear Gaussian channel scenarios; e.g., in MMSE-DFE structures (including precoding) 
for ISI channels and generalized MMSE-DFE structures for vector and multi- 
user channels I2D|- Some of the explanation for the "canonicality" of MMSE-DFE 
structures in the these more general scenarios is no doubt information-theoretic [THl 
The observations of this paper complement these results by showing why lattice-type 
codes combine so well with MMSE equalization structures, as shown previously in [T^ I22j. 

*I am grateful to J. M. Cioffi, U. Erez, R. Fischer and R. Zamir for many helpful comments. 



2 Lattice-based coding for the AWGN channel 



Consider the real discrete-time AWGN channel Y = X + N, where E[X 2 ] < S x and 
N is independent 1 zero-mean Gaussian noise with variance S n . The capacity is C = 
|log 2 (l + SNR) bits per dimension (b/d), where SNR = S x /S n . Following Erez and 
Zamir jHJ 122], we will show how lattice-based transmission systems can approach the 
capacity of this channel at all SNRs. 



2.1 Lattices and spheres 

Geometrically, an ^-dimensional lattice A is a regular infinite array of points in M. . 
Algebraically, A is a discrete subgroup ofR N which spans K . A Voronoi region TZy (A) of 
A represents the quotient group R^/A by a set of minimum-energy coset representatives 
for the cosets of A in M. N . For any x 6 ]R JV , "x mod A" denotes the unique element 
of TZy(A) in the coset A + x. Geometrically, ~R N is the disjoint union of the translated 
Voronoi regions {TZv(A) + A, A 6 A}. The volume ^(A) of 7£y(A) is therefore the volume 
of W N associated with each point of A. 

As N — > oo, the Voronoi regions of some A^-dimensional lattices can become more or 
less spherical, in various senses. As N — > oo, an A^-sphere (ball) of squared radius Np 2 
has normalized volume (per two dimensions) 

V^Np 2 ) 2 ' N ^ 2vep 2 . 

The average energy per dimension of a uniform probability distribution over such an 
A-sphere goes to P®(Np 2 ) = p 2 . The probability that an iid Gaussian random A-tuple 
with zero mean and symbol variance S n falls outside the A^-sphere becomes arbitrarily 
small for any S n < p 2 . 

It is known that there exist high-dimensional lattices whose Voronoi regions are quasi- 
spherical in the following second moment sense. The normalized second moment of a 
compact region TZ C $L N of volume V(1Z) is defined as 

G(K) = P{1Z) 

K ' v{n) 2 ' N ' 

where P{7Z) is the average energy per dimension of a uniform probability distribution over 
TZ. The normalized second moment of TZ exceeds that of an A^-sphere. The normalized 
second moment of an A^-sphere decreases monotonically with N and approaches as 
N — > oo. Poltyrev (reported in Feder-Zamir j2Ij) showed that there exist lattices A such 
that log27reG(A) is arbitrarily small, where G(A) denotes the normalized second moment 
of TZv(A). Such lattices are said to be "good for quantization," or "good for shaping." 

Poltyrev also showed that there exist high- dimensional lattices whose Voronoi 
regions are quasi-spherical in the sense that the probability that an iid Gaussian noise 
A^-tuple with symbol variance S n falls outside the Voronoi region 7£y(A) is arbitrarily 
small as long as 

s. < 

2vre 

Such lattices are said to be "good for AWGN channel coding," or "sphere-bound-achieving" 

m 

1 Note that without the independence of N, the "additive" property is vacuous, since for any real- 
input, real-output channel we may define N = Y — X, and then express Y as Y = X + N. We exploit 
this idea later. 



2.2 Mod-lattice transmission and capacity 



We now show that the mod-A transmission system shown in Figure 1 can approach the 
channel capacity C = |log 2 (l + S x /S n ) b/d arbitrarily closely, provided that G(A) « 
l/(2vre) and f(Y) is a MMSE estimator of X. 
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Figure 1. Mod-A transmission system over an AWGN channel. 

This system is based on an A^-dimensional lattice A whose Voronoi region IZy(A) 
has volume V^(A), average energy per dimension -P(A) = S x under a uniform probability 
distribution over TZy(A), and thus normalized second moment G(A) = P(A)/V(A) 2 ^ N . 

The N- dimensional input vector X is restricted to the Voronoi region TZy(A). The 
output vector Y is mapped by some function / to another vector /(Y) G Mr, which is 
then mapped modulo A to Y' = /(Y) mod A, also in the Voronoi region TZy(A). 

Our main result is that capacity can be approached in the system of Figure 1 if and 
only if the lattice A is "good for shaping" and the function /(Y) is an MMSE estimator. 
(The sufficiency of these conditions was shown in ffi I22|.) 

As a first step, we derive a lower bound: 

Theorem 1 (Mod-A channel capacity) The capacity C (A, f) of the mod-A transmis- 
sion system of Figure 1 is lowerbounded by 

C(A, f)>C-\ log 2 2vreG(A) - ± log 2 ^ b/d, 

where C = | log 2 (l + SNR) b/d is the capacity of the underlying AWGN channel, G(A) 
is the normalized second moment of IZy (A), and S e j and S e are the average energies per 
dimension o/Ej = /(Y) — X and of E = X(Y) — X, respectively, where X(Y) is the 
linear MMSE estimator o/X given Y. 

The key to the proof of this theorem is the introduction of a dither variable U that 
is known to both transmitter and receiver, and whose probability distribution is uniform 
over the Voronoi region TZy(A), as in [SJ 122]- Given a data vector V G TZv(A), the 
channel input is taken as 

X = V + U mod A. 

This makes X a uniform random variable over lZy(A), statistically independent of V. 
This property follows from the following lemma: 2 

2 We call this the crypto lemma because if we take X as plaintext, N as a cryptographic key, and 
Y = X + N as the encrypted message, then the encrypted message is independent of the plaintext 
provided that the key is uniform, so no information can be obtained about the plaintext from the 
encrypted message without the key. On the other hand, given the key, the plaintext may be easily 
recovered from the encrypted message via X = Y — N. This is the principle of the one-time pad, which, 
as Shannon showed, is essentially the only way to achieve perfect secrecy in a cryptographic system. 



Lemma 2 (Crypto lemma) Let G be a compact abelian group 3 with group operation 
+ 7 and let Y = X+N, where X and N are random variables overG and N is independent 
of X and uniform over G. Then Y is independent of X and uniform over G. 



Proof Since y — x runs through G as y runs through G and pjv(n) is constant over n G G, 
the distribution p Y \x(y\x) = Pn(v ~ x ) is constant over y G G for any x G G. □ 

One effect of the dither U is thus to ensure that the channel input X = V + U mod A 
is uniform over 7£y(A) and thus has average energy per dimension P(A) = S x . A second 
and more important effect is to make X and thus also Y = X + N independent of V. 

The dither may be subtracted out at the output of the channel, mod A, to give 

Z = /(Y) - U mod A. 

The end-to-end channel is illustrated in Figure 2. 
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Figure 2. Creation of a mod-A channel Z = /(Y) — U mod A using dither. 

Now let us regard /(Y) as an estimator of X, and define the estimation error as 
Ef — /(Y) — X. Since Y and X are independent of V, so is Ef. Then 



Z = X + E / -U = V + E / mod A. 

In short, we have created a mod-A additive noise channel Z = V + E/ mod A, where Ef 
is independent of V. This equivalent channel is illustrated in Figure 3. 
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Figure 3. Equivalent mod-A additive noise channel Z = V + Ef mod A. 

As is well known, the capacity of an additive- noise channel Z = V + Ef mod A is 
achieved when the input distribution is uniform over IZv(A), in which case the output 
distribution is uniform as well, by the crypto lemma. The capacity is equal to 

C(A, /) = i(fc(Z) - h(Z | V)) = l(log 2 V(A) - h(E' f )) b/d, 

where h(Z) = log 2 V(A) is the differential entropy of a uniform distribution over a region 
of volume V(A), and h(E'f) is the differential entropy of the A-aliased additive noise 
E'f — Ef mod A. Now since E'f is the result of applying the many-to-one mod-A map to 
Ef, we have 

ME}) < ME/). 

3 The group G is required to be compact so that its Haar (translation-invariant) measure n(G) is finite 
and thus normalizable to a uniform probability distribution over G. However, G need not be abelian. 



Moreover, if Ej has average energy per dimension S e j, then we have 



N 

h(E f )<-\og 2 2ireS eJ , 

the differential entropy of an iid zero-mean Gaussian distribution with the same average 
energy. Combining these results, using V(A) 2 / N = P(A)/G(A) and P(A) = S x , we have 



C(A, /) > 1 log 2 V(A) - \ log 2 2vre5 eJ = \ log 2 A 



1 



log 2 27reG(A) b/d. 



The linear MMSE estimator X(Y) of X is X(Y) = aY, where 



S 3 



SNR 



a = 



S x + S n 1 + SNR 



By the orthogonality principle of MMSE estimation theory, the linear MMSE estimation 
error E = X — aY = (1 — a)X — aN is then uncorrected with Y. 4 The average energy 
of the estimation error per dimension becomes 



S e = (l- a) 2 S x + a 2 S n 



S x S n 
S x + S n 



= aSr, 



(see footnote). Finally, since S x /S e = S y /S n = 1 + SNR, we have 



C(A,/)>ilog 2 ^+ilog 2 Se 



log 2 27reG(A) 



C-ilog 2 27reG(A)-ilog 2 ^ 



b/d. 



This completes the proof of Theorem 1. □ 
Remark 1 (dither is unnecessary). Evidently a channel Z = V + u + Ej mod A with 
a fixed dither vector u e TZy(A) has the same capacity C(A, /). Therefore introducing 
the random dither variable U is just a tactic to prove Theorem 1; dither is not actually 

4 Thcsc relations are illustrated by the "Pythagorean" right triangle shown in Figure 4 below, which 
follows from interpreting covariances as inner products of vectors in a two-dimensional Hilbert space. 
Since E[JTiV] = 0, the two vectors corresponding to X and N are orthogonal. Their squared lengths 
are given by E[JT 2 ] = S x and E[iV 2 ] = S n - The hypotenuse corresponds to the sum Y = X + N, 
and has squared length S y = S x + S n . Since EpYY] = S x , the projection of Y onto X is Y(X) = 
(E[XY}/E[X 2 ])X = X, and the projection of X onto Y is X(Y) = (E[XY]/E[Y 2 ])Y = aY. Then 
E = X— X(Y) = X — aY is orthogonal to Y. The inner right triangle in Figure 4 with sides (X(Y), E, X) 
is similar, so since S x — aS y the squared lengths of its sides are (S x — aS x ,S e — aS n ,S x — aS y ), 
respectively. 
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Figure 4. "Pythagorean" right triangle with sides (X,N,Y), 
with similar inner right triangle with sides (X(Y) — aY, E = X — X{Y),X). 



needed to achieve C(A, /). However, dither is key to decoupling the Shannon and the 
Wiener problems. 5 □ 

Remark 2 (MMSE estimation and bias). Notice that the signal-to- noise ratio of the 
channel Z = V + E mod A is S x / S e = S y / S n — 1 + SNR, and moreover this channel has 
no bias. Thus the MMSE factor a and random dither increase the effective signal-to- 
noise ratio from SNR to 1 + SNR without introducing bias. This is evidently a different 
way of approaching capacity than that given in j2], where the apparent SNR M mse-dfe 
was discounted to SNR M mse-dfe,u = SNR MM se-dfe — 1 to account for bias. □ 

Remark 3 ( "dirty-paper" capacity) . This approach easily extends to give a construc- 
tive proof of Costa's result fS] that channel interference known to the transmitter does 
not reduce capacity; see, e.g., |22[ IT]. Let the channel model be Y = X + N + S, where 
S is an arbitrary interference vector known to the transmitter. Then let the channel 
input be X = V + U — aS mod A. The channel input is still uniform and independent 
of V, by the crypto lemma, while the effect of the interference S is entirely cancelled in 
Z = aY — U = V + E/ mod A. Thus the receiver needs to know nothing about the 
interference, the equivalent channel model is the same, and C(A, /) is unaffected. □ 

Theorem 1 implies that the capacity C = |log 2 (l + SNR) can be approached arbi- 
trarily closely by the mod-A channel of Figure 1 if log27re(j(A) — > and /(Y) is the 
linear MMSE estimator X(Y) = aY, which is the main result of Erez and Zamir [H]. 

We now show that the conditions log 2 27reG(A) — > and S e j = S e are not only 
sufficient but also necessary to reach capacity. Briefly, the arguments are as follows: 

1. The differential entropy per dimension of X and Z, namely 

lfc(X) = l/i(Z) = \ log 2 V(A) 2/N = l - log 2 2ireS x - \ log 2 27reG(A) 

goes to \ log 2 2'KeS x if and only if log 2 27reG(A) — > 0. This condition is necessary because 
the capacity of an AWGN channel with input power constraint S x can be approached 
arbitrarily closely only if h(X.)/N approaches | log 2 27re>S'. r . 

Remark 4 (Gaussian approximation principle). The differential entropy of any ran- 
dom vector X with average energy per dimension S x is less than or equal to | log 2 27reS x , 
with equality if and only if X is iid Gaussian. Therefore if X n is a sequence of ran- 
dom vectors of dimension N(n) — > oo and average energy per dimension S x such that 
h("Kn)/N(n) — > | log 2 2ireS x , we say that the sequence X n is Gaussian in the limit. Re- 
stating the above argument, if X n is uniform over lZv(A n ), then X n is Gaussian in the 
limit if and only if log 2neG(A n ) -> 0. 6 □ 

2. The channel output Y = X + N is then also Gaussian in the limit, so the linear 
MMSE estimator X(Y) = aY becomes a true MMSE estimator in the limit. The 
MMSE estimation error E = — (1 — a)X + aN becomes Gaussian in the limit with 
symbol variance S e = aS n , and becomes independent of Y. In order that C(A, /) — > C, 
it is then necessary that S e j = S e , which by definition implies that /(Y) is an MMSE 
estimator. 7 

5 This is analogous to the tactic used by Elias to prove that binary linear block codes can achieve 
the capacity of a binary input-symmetric channel, namely the introduction of a random translate C + U 
of a binary linear block code C of length N, where U is a random uniform binary iV-tuple in (F2) w . 

6 Zamir and Feder |21| show that if X„ is uniform over an iV(n)-dimensional region lZ n of average 
energy S x and G(lZ n ) — > l/(27re), then the normalized divergence -^UyD(X n | |N n ) — > 0, where N n is 
an iid Gaussian random vector with zero mean and variance S x . They go on to show that this implies 
that any finite-dimensional projection of X„ converges in distribution to an iid Gaussian vector. 

7 Since E/ = E + (/(Y) - X(Y)) and Y and E are independent, S e j = S e + ^E[||/(Y) - X(Y)|| 2 ]. 
Thus f(Y) is an MMSE estimator if and only if E[||/(Y) - X(Y)|| 2 ] = 0. 



In summary, these two conditions are necessary as well as sufficient: 



Theorem 3 (Necessary conditions to approach C) The capacity of the mod-A 
channel of Figure 1 approaches C if and only i/log27reG(A) — ► and /(Y) is an MMSE 
estimator o/X given Y. 

Remark 5 (MMSE estimation and lattice decoding). One interpretation of the Erez- 
Zamir result is that the scaling introduced by the MMSE estimator is somehow essential 
for lattice decoding of a fine-grained coding lattice A c . Theorem 3 shows however that 
in the mod-A channel an MMSE estimator is necessary to achieve capacity, quite apart 
from any particular coding and decoding scheme. □ 

Remark 6 (aliasing becomes negligible). Under these conditions, Theorem 1 says 
that C(A, /) > C. Since C(A, /) cannot exceed C, this implies that all inequalities in 
the proof of Theorem 1 must tend to equality, and in particular that 

h(E') h(E) 1 n n 

-^T 1 -> — tt^ -> 7; logo 2ireS e , 
N N 2 

where E' = E mod A is the A-aliased version of the estimation error E. So not only 
must E become Gaussian in the limit, i.e., h(E)/N — > ~ log 2 2neS e , but also E' must 
tend to E, which means that the effect of the mod-A aliasing must become negligible. 
This is as expected, since E is Gaussian in the limit with symbol variance S e and 7£y(A) 
is quasi-spherical with average energy per dimension S x > S e . □ 



2.3 Voronoi codes 

A Voronoi code C((A C + u)/A) = (A c + u) n IZy(A) is the set of points in a translate 
A c + u of an A^-dimensional "coding lattice" A c that lie in the Voronoi region lZy(A) of 
a "shaping" sublattice A C A c . (Such codes were called "Voronoi codes" in [I], "Voronoi 
constellations" in [TT], and "nested lattice codes" in [SJ 123 EE] Here we will use the 
original term.) 

A Voronoi code has |A C /A| = V(A)/V(A C ) code points, and thus rate 

r,/* /an !, V(A) 1/ V(A) 2/N , ^(A,) 2 ^^ 1 ;j 

Erez and Zamir (SJ 122] have shown rigorously (not employing the Gaussian approxi- 
mation principle) that there exists a random ensemble C((A C + U)/A) of dithered Voronoi 
codes that can approach the capacity C (A) of the mod-A transmission system of Figure 
1 arbitrarily closely, if /(Y) = X(Y) = aY. The decoder may be the usual minimum- 
Euclidean-distance decoder, even though the effective noise E = — (1 — oj)X + aN is not 
Gaussian. 

If C(A) w C and P(A) = S x , this implies that 27reG(A) w 1; i.e., A is "good for 
shaping." Furthermore, since the effective noise has variance S e , if the error probability 
is arbitrarily small and R(A C /A) ~ C — |log 2 S x / S e , then 

. _ . V(A C ) 2 ^ 
log 2 5 e -log 2 ^^; 

i.e., A c is "good for AWGN channel coding," or "sphere-bound-achieving." 



The ensemble C((A C + U)/A) is an ensemble of fixed-dither Voronoi codes C((A C + 
u)/A). The average probability of decoding error Pt\j(E) = Eu[Pt u (E)} is arbitrarily 
small over this ensemble, using a decoder that is appropriate for random dither (i.e., 
minimum-distance decoding). This implies not only that there exists at least one fixed- 
dither code C((A c + u)/A) such that Pr u (E) < Pijj(E), using the same decoder, but also 
that at least a fraction 1 — e of the fixed-dither codes have Pt u (E) < -Pt\j(E); i.e., 
almost all fixed-dither codes have low Pr u (E). 

This result is somewhat counterintuitive, since for fixed dither u, X is not independent 
of V; indeed, there is a one-to-one correspondence given by X = V+u mod A. Therefore, 
the error 

E = -(1 - a)X + «N = -(1 - a)(V + u mod A) + «N 

is not independent of V; i.e., there is bias in the equivalent channel output Z = V + 
E mod A. Even so, we see that capacity can be achieved by a suboptimum decoder which 
ignores bias. 

Since almost all fixed-dither codes achieve capacity, we may as well use the code 
C((A C + u)/A) that has minimum average energy S min < -P(A) = S x per dimension. But 
if *Smin < S x , then we could achieve a rate greater than the capacity of an AWGN channel 
with signal-to-noise ratio S m j n / S n < S x /S n . We conclude that the average energy per 
dimension of C((A c + u)/A) cannot be materially less than S x = -P(A) for any u, and thus 
must be approximately S x for almost all values of the dither u, in order for the average 
over U to be In summary: 

Theorem 4 (Average energy of Voronoi codes) //C((A c +u)/A) is a capacity- achieving 
Voronoi code, then A c is good for AWGN channel coding, A is good for shaping, the de- 
coder may ignore bias, and the average energy per dimension o/C((A c + u)/A) is pa -P(A). 

Remark 7 (Average energy of Voronoi codes). Theorem 4 shows that the hope 
of that one could find particular Voronoi codes with average energy S x — S e was 
misguided. For Voronoi codes, the original "continuous approximation" of JU] holds, not 
the "improved continuous approximation" of [T2] . 

Remark 8 (observations on output scaling). It is surprising that a decoder for 
Voronoi codes which first scales the received signal by a and then does lattice decoding 
should perform better than one that just does lattice decoding. Optimum (ML) decoding 
on this channel is minimum- distance (MD) decoding, and ordinary lattice decoding is 
equivalent to minimum-distance decoding except on the boundary of the support region. 

Scaling by a seems excessive. Scaling the output by a reduces the received variance 
to S x = a 2 S y = aS x , less than the input variance. This means that the scaled output aY 
is almost surely going to lie in a spherical shell of average energy per dimension pa aS x , 
whereas the code vectors in the Voronoi code C(A C /A) almost all lie on a spherical shell 
of average energy pa S x . Yet the subsequent lattice decoding to C((A C + u)/A) works, 
even though it seems that the decoder should decode to aC((A c + u)/A). 

These questions about scaling may be resolved if as N — > oo it suffices to decode 
Voronoi codes based on angles, ignoring magnitudes. Then whether the decoder uses 
Y, aY or y/aY as input, the optimum minimum-angle decoder would be the same. Indeed, 
Urbanke and Rimoldi JH]; following Linder et al. have shown that as N —>■ oo a 

suboptimum decoder for spherical lattice codes that does minimum-angle decoding to 
the subset of codewords in a spherical shell of average energy pa S x suffices to approach 
capacity. 



Of course, lattice decoding does depend on scale, so it seems that scaling the lattice 
decoder is just a trick to analyze the optimal minimum-angle decoder performance, as 
well as to show that lattice decoding of Voronoi codes suffices to reach capacity. 

Finally, note that with a fixed code and scaling by a, as N — ► oo the output aY 
almost surely lies in a sphere of average energy pa aS x < S x , inside 7£y(A), so the mod-A 
operation in the receiver has negligible effect and may be omitted. □ 

Remark 9 (Shannon codes, spherical lattice codes, and Voronoi codes). In Shannon's 
random code ensemble for the AWGN channel, the code point X asymptotically lies 
almost surely in a spherical shell of average energy per dimension pa S x , the received 
vector Y lies almost surely in a spherical shell of average energy per dimension pa S y , and 
the noise vector N lies almost surely in a spherical shell of average energy per dimension 
~ S n . Thus we obtain a geometrical picture in which a "output sphere" of average energy 
~ S y is partitioned into ~ (S y /S n ) N ^ 2 probabilistically disjoint "noise spheres" of squared 
radius ~ S n . Curiously, the centers of the noise spheres are at average energy pa S x , even 
though practically all of the volumes of the noise spheres are at average energy ~ Sy. 

Urbanke and Rimoldi JH| have shown that spherical lattice codes (the set of all 
points in a lattice A c that lie within a sphere of average energy S x ) can achieve the 
channel capacity C = \ log 2 S y /S n b/d with minimum-distance decoding. Since again Y 
and N must lie almost surely in spheres of average energy S y and S n , respectively, we 
again have a picture in which the output sphere must be partitioned into ~ (S y /S n ) N / 2 
effectively disjoint noise spheres whose centers are the points in the spherical lattice code, 
which have average energy pa S x . 

Voronoi codes evidently work differently. The Voronoi region 72-v(A) has average 
energy S x , and so does any good Voronoi code C((A C + u)/A). Moreover, 7£y(A) is 
the disjoint union (mod A) of V(A)/V(A C ) ~ (S x /S e ) N ^ 2 small Voronoi regions, whose 
centers are the points in C((A C + u)/A). So the centers have the same average energy as 
the bounding region, in contrast to the spherical case. 

By the sphere bound [121 HZ] log 2 V(A c ) 2//7V /(27re) > log 2 S c , where S c is the channel 
noise variance, so the capacity of the mod-A channel is limited to | log 2 S x /S c . If the 
channel noise has variance S c = S n , then the capacity is limited to C — | log 2 S x /S n = 
| log 2 SNR, which is the best that de Buda and others [HJ HHj were able to achieve 
with Voronoi codes prior to jS]. However, the MMSE estimator reduces the effective 
channel noise variance to S c — S e — aS n , which allows the capacity to approach 
C = \\og 2 S x /S e = |log 2 (l + SNR). So in the mod-A setting the MMSE estimator 
is the crucial element that precisely compensates for the Voronoi code capacity loss from 
C to the "lattice capacity" C. 

Finally, consider a "backward-channel" view of the Shannon ensemble. The jointly 
Gaussian pair (X, Y) is equally well modeled by the forward-channel model Y = X+N or 
the backward-channel model X = aY + E. From the latter perspective, the transmitted 
codeword X lies almost surely in a spherical shell of average energy pa S e about the 
scaled received word aY, which lies almost surely in a spherical shell of average energy 
~ a 2 S y = aS x . Thus we obtain a geometrical picture in which an "input sphere" of 
average energy pa S x is partitioned into pa (S x /S e ) N ^ 2 probabilistically disjoint "decision 
spheres" of squared radius pa S e . The centers of the decision spheres are codewords of 
average energy pa S x . 

Capacity-achieving Voronoi codes thus appear to be designed according to the backward- 
channel view of the Shannon ensemble, whereas capacity-achieving spherical lattice codes 
appear to be designed according to the forward-channel view. 
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